219 research outputs found

    Classifying Process Instances Using Recurrent Neural Networks

    Get PDF
    Process Mining consists of techniques where logs created by operative systems are transformed into process models. In process mining tools it is often desired to be able to classify ongoing process instances, e.g., to predict how long the process will still require to complete, or to classify process instances to different classes based only on the activities that have occurred in the process instance thus far. Recurrent neural networks and its subclasses, such as Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM), have been demonstrated to be able to learn relevant temporal features for subsequent classification tasks. In this paper we apply recurrent neural networks to classifying process instances. The proposed model is trained in a supervised fashion using labeled process instances extracted from event log traces. This is the first time we know of GRU having been used in classifying business process instances. Our main experimental results shows that GRU outperforms LSTM remarkably in training time while giving almost identical accuracies to LSTM models. Additional contributions of our paper are improving the classification model training time by filtering infrequent activities, which is a technique commonly used, e.g., in Natural Language Processing (NLP).Peer reviewe

    Exploring Interpretability for Predictive Process Analytics

    Full text link
    Modern predictive analytics underpinned by machine learning techniques has become a key enabler to the automation of data-driven decision making. In the context of business process management, predictive analytics has been applied to making predictions about the future state of an ongoing business process instance, for example, when will the process instance complete and what will be the outcome upon completion. Machine learning models can be trained on event log data recording historical process execution to build the underlying predictive models. Multiple techniques have been proposed so far which encode the information available in an event log and construct input features required to train a predictive model. While accuracy has been a dominant criterion in the choice of various techniques, they are often applied as a black-box in building predictive models. In this paper, we derive explanations using interpretable machine learning techniques to compare and contrast the suitability of multiple predictive models of high accuracy. The explanations allow us to gain an understanding of the underlying reasons for a prediction and highlight scenarios where accuracy alone may not be sufficient in assessing the suitability of techniques used to encode event log data to features used by a predictive model. Findings from this study motivate the need and importance to incorporate interpretability in predictive process analytics.Comment: 15 pages, 7 figure

    Exploiting Event Log Event Attributes in RNN Based Prediction

    Get PDF
    In predictive process analytics, current and historical process data in event logs are used to predict future. E.g., to predict the next activity or how long a process will still require to complete. Recurrent neural networks (RNN) and its subclasses have been demonstrated to be well suited for creating prediction models. Thus far, event attributes have not been fully utilized in these models. The biggest challenge in exploiting them in prediction models is the potentially large amount of event attributes and attribute values. We present a novel clustering technique which allows for trade-offs between prediction accuracy and the time needed for model training and prediction. As an additional finding, we also find that this clustering method combined with having raw event attribute values in some cases provides even better prediction accuracy at the cost of additional time required for training and prediction.Peer reviewe

    XNAP: Making LSTM-based Next Activity Predictions Explainable by Using LRP

    Full text link
    Predictive business process monitoring (PBPM) is a class of techniques designed to predict behaviour, such as next activities, in running traces. PBPM techniques aim to improve process performance by providing predictions to process analysts, supporting them in their decision making. However, the PBPM techniques` limited predictive quality was considered as the essential obstacle for establishing such techniques in practice. With the use of deep neural networks (DNNs), the techniques` predictive quality could be improved for tasks like the next activity prediction. While DNNs achieve a promising predictive quality, they still lack comprehensibility due to their hierarchical approach of learning representations. Nevertheless, process analysts need to comprehend the cause of a prediction to identify intervention mechanisms that might affect the decision making to secure process performance. In this paper, we propose XNAP, the first explainable, DNN-based PBPM technique for the next activity prediction. XNAP integrates a layer-wise relevance propagation method from the field of explainable artificial intelligence to make predictions of a long short-term memory DNN explainable by providing relevance values for activities. We show the benefit of our approach through two real-life event logs

    Accurate and Transparent Path Prediction Using Process Mining

    Get PDF
    Anticipating the next events of an ongoing series of activities has many compelling applications in various industries. It can be used to improve customer satisfaction, to enhance operational efficiency, and to streamline health-care services, to name a few. In this work, we propose an algorithm that predicts the next events by leveraging business process models obtained using process mining techniques. Because we are using business process models to build the predictions, it allows business analysts to interpret and alter the predictions. We tested our approach with more than 30 synthetic datasets as well as 6 real datasets. The results have superior accuracy compared to using neural networks while being orders of magnitude faster

    What’s in a Relationship: An Ontological Analysis

    Full text link
    Abstract. In a series of publications, we have proposed a foundational system of ontological categories which has been successfully used to evaluate and im-prove the quality of conceptual modeling grammars and models. In this article, we continue this work by using this foundational ontology to provide real-world semantics and sound modeling guidelines for one of the most fundamental (and yet one of the most problematic) constructs in conceptual modeling, namely, the relationship type. In addition, we systematically compare our approach with a classical ontological treatment of this construct in the literature, provided by the BWW framework.

    Evaluation of a commercial E(rns)-capture ELISA for detection of BVDV in routine diagnostic cattle serum samples

    Get PDF
    BACKGROUND: Bovine viral diarrhoea virus (BVDV) is an important pathogen in cattle. The ability of the virus to cross the placenta during early pregnancy can result in the birth of persistently infected (PI) calves. These calves shed the virus during their entire lifespan and are the key transmitters of infection. Consequently, identification (and subsequent removal) of PI animals is necessary to rapidly clear infected herds from the virus. The objective of this study was to evaluate the suitability of a commercial E(rns)-capture ELISA, in comparison to the indirect immunoperoxidase test (IPX), for routine diagnostic detection of BVDV within a control programme. In addition, the effect of passive immunity and heat-inactivation of the samples on the performance of the ELISA was studied. METHODS: In the process of virus clearance within the Swedish BVDV control programme, all calves born in infected herds are tested for virus and antibodies. From such samples, sent in for routine diagnostics to SVA, we selected 220 sera collected from 32 beef herds and 29 dairy herds. All sera were tested for BVDV antigen using the E(rns )ELISA, and the results were compared to the results from the IPX used within the routine diagnostics. RESULTS: All 130 samples categorized as virus negative by IPX were tested negative in the ELISA, and all 90 samples categorized as virus positive were tested positive, i.e. the relative sensitivity and specificity of the ELISA was 100% in relation to IPX, and the agreement between the tests was perfect. CONCLUSION: We can conclude that the E(rns )ELISA is a valid alternative that has several advantages compared to IPX. Our results clearly demonstrate that it performs well under Swedish conditions, and that its performance is comparable with the IPX test. It is highly sensitive and specific, can be used for testing of heat-inactivated samples, precolostral testing, and probably to detect PI animals at an earlier age than the IPX

    Predictive Process Monitoring Methods: Which One Suits Me Best?

    Full text link
    Predictive process monitoring has recently gained traction in academia and is maturing also in companies. However, with the growing body of research, it might be daunting for companies to navigate in this domain in order to find, provided certain data, what can be predicted and what methods to use. The main objective of this paper is developing a value-driven framework for classifying existing work on predictive process monitoring. This objective is achieved by systematically identifying, categorizing, and analyzing existing approaches for predictive process monitoring. The review is then used to develop a value-driven framework that can support organizations to navigate in the predictive process monitoring field and help them to find value and exploit the opportunities enabled by these analysis techniques

    Benchmarking Ontologies: Bigger or Better?

    Get PDF
    A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them

    Feline Leukemia Virus and Other Pathogens as Important Threats to the Survival of the Critically Endangered Iberian Lynx (Lynx pardinus)

    Get PDF
    BACKGROUND: The Iberian lynx (Lynx pardinus) is considered the most endangered felid species in the world. In order to save this species, the Spanish authorities implemented a captive breeding program recruiting lynxes from the wild. In this context, a retrospective survey on prevalence of selected feline pathogens in free-ranging lynxes was initiated. METHODOLOGY/ PRINCIPAL FINDINGS: We systematically analyzed the prevalence and importance of seven viral, one protozoan (Cytauxzoon felis), and several bacterial (e.g., hemotropic mycoplasma) infections in 77 of approximately 200 remaining free-ranging Iberian lynxes of the Doñana and Sierra Morena areas, in Southern Spain, between 2003 and 2007. With the exception of feline immunodeficiency virus (FIV), evidence of infection by all tested feline pathogens was found in Iberian lynxes. Fourteen lynxes were feline leukemia virus (FeLV) provirus-positive; eleven of these were antigenemic (FeLV p27 positive). All 14 animals tested negative for other viral infections. During a six-month period in 2007, six of the provirus-positive antigenemic lynxes died. Infection with FeLV but not with other infectious agents was associated with mortality (p<0.001). Sequencing of the FeLV surface glycoprotein gene revealed a common origin for ten of the eleven samples. The ten sequences were closely related to FeLV-A/61E, originally isolated from cats in the USA. Endogenous FeLV sequences were not detected. CONCLUSIONS/SIGNIFICANCE: It was concluded that the FeLV infection most likely originated from domestic cats invading the lynx's habitats. Data available regarding the time frame, co-infections, and outcome of FeLV-infections suggest that, in contrast to the domestic cat, the FeLV strain affecting the lynxes in 2007 is highly virulent to this species. Our data argue strongly for vaccination of lynxes and domestic cats in and around lynx's habitats in order to prevent further spread of the virus as well as reduction the domestic cat population if the lynx population is to be maintained
    corecore